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Gaussian Distributions 

Frederic Pascal, Lionel Bombrun, Jean- Yves Tourneret and Yannick Berthoumieu. 



Abstract — Due to its heavy-tailed and fully parametric form, 
the multivariate generalized Gaussian distribution (MGGD) has 
been receiving much attention for modeling extreme events 
in signal and image processing applications. Considering the 
estimation issue of the MGGD parameters, the main contribution 
of this paper is to prove that the maximum likelihood estimator 
(MLE) of the scatter matrix exists and is unique up to a scalar 
factor, for a given shape parameter /3 G (0, 1). Moreover, an 
estimation algorithm based on a Newton-Raphson recursion is 
proposed for computing the MLE of MGGD parameters. Various 
experiments conducted on synthetic and real data are presented 
to illustrate the theoretical derivations in terms of number of 
iterations and number of samples for different values of the 
shape parameter. The main conclusion of this work is that the 
parameters of MGGDs can be estimated using the maximum 
likelihood principle with good performance. 

Index Terms — Multivariate generalized Gaussian distribution, 
covariance matrix estimation, fixed point algorithm. 



L Introduction 

UNIVARIATE and multivariate generalized Gaussian dis- 
tributions (GGDs) have received much attention in the 
literature. Some properties of these distributions have been 
reported in several papers such as [l]-[3]. These properties 
include various stochastic representations, simulation methods 
and probabilistic characteristics. GGDs belong to the family 
of elliptical distributions (EDs) [4], [5], originally introduced 
by Kelker in [6] and studied in [7], [8]. Interestingly, for 
particular values of the shape parameter /3, some GGDs, v- 
spherical distributions [9] and spherically invariant random 
vector (SIRV) distributions [10] can share the same definitions, 
as illustrated in Fig. 1. However, in general, these families of 
distributions are different. 

Multivariate GGDs (MGGDs) have been used intensively in 
the image processing community. Indeed, including Gaussian 
and Laplacian distributions as special cases, MGGDs are 
potentially interesting for modeling the statistical properties 
of various images or features extracted from these images. In 
particular, the distribution of wavelet or curvelet coefficients 
has been shown to be modeled accurately by GGDs [11]-[14]. 
This property has been exploited for many image processing 
applications including image denoising [15]-[18], context- 
based image retrieval [19], [20], image thresholding [21] 
or texture classification in industrial problems [22]. Other 
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Fig. 1. Families of EDs, SIRV distributions and MGGDs. 



applications involving GGDs include radar [23], video coding 
and denoising [24]-[26] or biomedical signal processing [25], 
[27], [28]. Finally, it is interesting to note that a specific 
attention has been recently devoted to complex GGDs in [29], 
[30]. 

Estimating the parameters of GGDs is thus an important 
issue. Classical estimation methods that have been investigated 
for univariate GGDs include the maximum likelihood (ML) 
method and the method of moments [31]. In the multivariate 
context, MGGD parameters can be estimated by a least- 
squares method as in [17] or by minimizing a distance 
between the histogram of the observed data and the theoreti- 
cal probabilities associated with the MGGD [32]. Estimators 
based on the method of moments and on the ML method have 
also been proposed in [33], [34]. The main contribution of this 
paper is to show that for a given shape parameter j3 belonging 
to (0, 1), the maximum likelihood estimator (MLE) of the 
scatter matrix IM exists and is unique up to a scalar factor. 
An iterative algorithm based on a Newton-Raphson procedure 
is then proposed to compute the MLE of IVi. 

Several works have analyzed covariance matrix estimators 
defined under different modeling assumptions. On the one 
hand, fixed point (FP) algorithms have been derived and 
analyzed in [35], [36] for SIRVs. On the other hand, in the 
context of robust estimation, the properties of M-estimators 
have been studied by Maronna in [37]. Unfortunately, 
Maronna's conditions are not fully satisfied for MGGDs 
(see remark n.3). This paper shows that despite the non- 
applicability of Maronna's results, the MLE of MGGD 
parameters exists, is unique and can be computed by an FP 
algorithm. Although the methodology adopted in this paper 
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has some similarities with the one proposed in [35], [36], 
there are also important differences which require a specific 
analysis (see for instance remark III.l). 

The paper is organized as follows. Section II defines the 
MGGDs considered in this study and derives the MLEs of their 
parameters. Section III presents the main theoretical results 
of this paper while a proof outline is given in Section IV. 
For presentation clarity, full demonstrations are provided in 
the appendices. Section V is devoted to simulation results 
conducted on synthetic and real data. The convergence speed 
of the proposed estimation algorithm as well as the bias 
and consistency of the scatter matrix MLE are first inves- 
tigated using synthetic data. Experimentations performed on 
real images extracted from the VisTex database are then 
presented. Conclusions and future works are finally reported 
in Section VI. 

II. Problem formulation 

A. Definitions 

The probability density function of an MGGD in is 
defined by [38] 

1 



p(x|]V[,m,/ 



\M\^ 



(1) 



for any x G M^, where IM is a p x p symmetric real scatter 
matrix, x^ is the transpose of the vector x, and h{-) is a 
so-called density generator defined by 

hm,(3 {y) = — r-\ — ^^exp ( I (2) 



■fr(^)2* 



^ m2 



for any y G where m and (3 are the MGGD scale 

and shape parameters. The matrix M. will be normalized in 
this paper according to Tr(]V[) = p, where Tr(]V[) is the 
trace of the matrix ]V[. It is interesting to note that making 
P = 0.5 in (1) leads to the multivariate Laplace distribution 
whereas {3 = 1 corresponds to the multivariate Gaussian 
distribution. Moreover, when P tends toward infinity, the 
MGGD is known to converge in distribution to a multivariate 
uniform distribution (see (3)). 

B. Stochastic representation 

Let X be a random vector of W distributed according to an 
MGGD with scatter matrix E = rriM. and shape parameter {3. 
Gomez et ah have shown that x admits the following stochastic 
representation [1] 



r 5]2 u 



(3) 



where = means equality in distribution, u is a random vector 
uniformly distributed on the unit sphere of R^, and r is a 
scalar positive random variable such that 



.2/5 



,2 



(4) 



where r(a, 6) is the univariate gamma distribution with pa- 
rameters a and h (see [39] for definition). 



C. MGGD parameter estimation for known (3 

Let (xi,...,XAr) be N independent and identically dis- 
tributed (i.i.d.) random vectors distributed according to an 
MGGD with parameters ]V[,m and /3. This section studies 
estimators of IVE and m based on (xi, . . . , x^v) for a known 
value of 13 G (0,1)^ The MGGD is a particular case of 
elliptical distribution that has received much attention in the 
literature. Following the results of [40] for real elliptical 
distributions, by differentiating the log-likelihood of vectors 
(xi, . . . , xat) with respect to IM, the MLE of the matrix ]V[ 
satisfies the following FP equation 



where gm,(3{y) = dhm,(3{y) /dy. In the particular case of 
an MGGD with known parameters m and /3, straightforward 
computations lead to 

M=J^E ,.,J:^^.-, . (6) 



^=i(xfM- ^ 

When the parameter m is unknown, the MLEs of M. and m are 
obtained by differentiating the log-likelihood of (xi, . . . , xat) 
with respect to M. and m yielding 



M 



pN 



N 

E 

N 



(xfM- 



M" 



(7) 



(8) 



After replacing m in (6) by its expression (8), the following 
result can be obtained 

i=i yi^yi i^Vj 



with yi = xf ]V[~^Xi. As mentioned before and confirmed 
by (9), ]V[ can be estimated independently from the scale 
parameter m. Moreover, the following remarks can be made 
about (9). 

Remark II. 1 

When (3 = 1, Eq. (9) is close to the sample covariance 
matrix (SCM) estimator (the only difference between the SCM 
estimator and (9) is due to the estimation of the scale parameter 
that equals 1 for the multivariate Gaussian distribution). For 
P = 0, (9) reduces to the FP covariance matrix estimator that 
has received much attention in [40]-[42]. 

Remark II.2 

Equation (9) remains unchanged if M. is replaced by aM. 
where a is any non-zero real factor Thus, the solutions of (9) 
(when there exist) can be determined up to a scale factor a. The 
normalization Tr{M.) = p will be adopted in this paper and 
will be justified in the simulation section. 

^We note here that most values of /3 encountered in practical applications 
belong to the interval (0, 1). 
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Remark II.3 

Let us consider the function fi defined by 



My) 



Np 



,1-/3 



(10) 



y + Ciy 

where Ci is a positive constant independent of y (the index i is 
used here to stress the fact that ci = ^ changes with i but 

does not depend on yj. Equation (9) can be rewritten as 



M 



1 ^ 

i=l 



(11) 



Roughly speaking^ , fi satisfies Maronna's conditions (recapped 
below, see [37, p. 53] for more details) for any j3 G (0,1) except 
the continuity aty = 0. 

Maronna 's conditions for a function / : M ^ R 

(i) / is non-negative, non increasing, and continuous on 
[0,oo). 

(ii) Let iIj{s) = s f{s) and K = sup?/^(s). The function 

s>0 

is non decreasing and strictly increasing in the interval 
defined by < K withp < K < oo. 

(iii) Let Pn (•) denotes the empirical distribution of 
xi,...,X7v. There exists a > such that for all 
hyperplane W with dim{W) < p — I 

P 



PN{W)<l---a. 



(12) 



Because of non continuity of fi around 0, the properties of Ad- 
estimators derived by Maronna cannot be applied directly to the 
estimators of the MGGD parameters. The objective of the next 
section is to derive similar properties for the estimator of M 
defined by the FP equation (9). 

D. MGGD parameter estimation for unknown j3 

When the shape parameter p of the MGGD is unknown, 
the MLE of M, m and {3 is obtained by differentiating the 
log-HkeHhood of (xi, . . . , x^v) with respect to M and m and 

i.e., by combining (7) and (8) with the following relation 



a{0) 



pN 



i=l 



E[2/fln(yi)' 



N ■ 



2/3 [pN^^y^^ 



pN_ 







. V2/3 



In 2 



(13) 



where ^(•) is the digamma function. Equation (9) shows 
that M and /3 can be estimated independently from the scale 
parameter m. 

III. Statements of the main results 

As the estimation scenario presented in the previous section 
has some similarities with the FP estimator studied in [35], 
similar results about the estimator existence, uniqueness (up 
to a scale factor) and FP algorithm convergence are expected 

^Actually, Maronna's function depends only on the i*^ sample and not on 
all the samples as it is the case here! 



to be true. This section summarizes the properties of the FP 
estimator defined by (9) for a known value of /3 G (0, 1) (all 
proofs are provided in the appendices to simplify the reading). 
The case of an unknown value of (3 will be discussed in the 
simulation section. 



A. Notations 

For any positive integer n, |1, n] denotes the set of integers 
{1, . . . , n}. For any vector x G M^, ||x|| denotes the Euclidean 
norm of x such as ||x|p = x^x, where x^ is the transpose 
of X. Throughout the paper, we will use several basic results 
about square matrices, especially regarding the diagonalization 
of real symmetric and orthogonal matrices. We invite the 
reader to consult [43] for details about these standard results. 
Denote as Mp(R) the set of p x p real matrices, SO{p) the 
set of pxp orthogonal matrices and M.^ the transpose of M.. 
The identity matrix of Mp{R) will be denoted as Ip. Several 
subsets of matrices used in the sequel are defined below 

* P is the subset of Mp{R) defined by the symmetric 
positive definite matrices; 

* P is the closure of V in Mp{R), i.e., the subset of Mp{R) 
defined by the symmetric non negative definite matrices; 

* For all a > 



V{a) = {MeV_ \ \\M\\=a}, 
vla) = {MeV\ \\M\\=a}, 

where V{a) is a compact subset of Mp(R), 
the Frobenius norm. 



(14) 



being 



For ]V[ G P, we introduce the open-half line spanned by ]V[ de- 
fined by >Cm = {AIM, A > 0}. Note that the order associated 
with the cone structure of V is called the Loewner order for 
symmetric matrices of Mp{R) and is defined as follows. For 
any pair of two symmetric pxp real matrices (A, B), A < B 
(A < B respectively) means that the quadratic form defined 
by B — A is non negative (positive definite respectively), i.e., 
for all non zero x G R^, x^(A — B)x > 0,(> respectively). 
Using that order, one has M. e V (e V respectively) if and 
and only if ]V[ > (]V[ > respectively). 

This section will make use of the following two applications 



M 



\{0} 



|M|- (tyf) 



(15) 



and 



fx ' ^ 



M 



V 



1 ^ 



Np 



N 



i=i yi 



(16) 



where x = (xi, . . . , xat), yi = xflVE-^x^ and (3 G (0,1). 
The function F^ is the likelihood of (xi, . . . ,XAr) in which 
the parameter m has been replaced by its estimator (8), up to 



4 



SUBMITTED TO IEEE TRANS. ON SIGNAL PROCESSING 



a multiplicative constant and a power factor. Indeed: 



N 



J]p(x,|M,m,/3) 



/^r(f) 



N 



22? 



exp 



N/2 



pN 
~2P 



It is clear that is homogeneous of degree zero whereas 
/y is homogeneous of degree one, i.e., for all A > and 
MeV, one has 

Fy(AM)=Fy(M), /y(AM)=A/y(M). 

In order to understand the relationships between the two 
functions and f^, we can compute the gradient of F^ at 
M. eV. Straightforward computations lead to 

VFy (M) = Fy (M) [/^(M) - M] (17) 

Clearly M is an FP of if and only if M is a critical point 
of the vector field defined by VF^ on V, i.e., VFy(M) = 0. 

Remark III.l 

As displayed in Fig. 1, there is a common part between the 
MGGDs and the SIRV distributions (that are both specific 
elliptical distributions). However, all MGGDs are not SIRV 
distributions and conversely. As a consequence, the FP equation 
(9) associated with the MGGDs relies on the function which 
differs from the FP equation studied in [35] (which corresponds 
to the particular case /3 = 0) and from that of [36] which 
corresponds to SIRVs with random multipliers r. Similarly, 
the shape of the function F^ differs significantly from the 
likelihoods studied in [35] and [36] that are defined as products 
of texture based integrals. 

In the sequel, we also use for n > 1 to denote the n-th 
iterate of /, i.e., := /o ... o/, where / is repeated n times. 
We also adopt the standard convention := Idi^, where Idj) 
is the identity function defined in V. To finish this section, we 
introduce an important assumption about the vectors for 
l<i<N 

• {H)\ For any set of p indices belonging to |1,A^] and 
satisfying < ... < i{p), the vectors x^(i), . . . , x^(p) 
are linearly independent. 
This hypothesis is a key assumption for obtaining all our 
subsequent results. Hypothesis {H) has the following trivial 
but fundamental consequence that we state as a remark 

Remark III.2 

For all n vectors x^(i) , . . . , ^a^n) ^i^h l<n<p,l<i<N, 
the vector space generated by x^(i) , . . . , x^(^) has dimension n. 

B. Contributions 

The contributions of this paper are summarized in the 
following theorems whose proofs are outlined in the next 
section. 

Theorem III.l ^ 

For a given value of (3 G (0, 1), there exists M.fp G V with unit 
norm such that, for all a > 0, admits a unique FP of norm 



a > equal to a M^p. Moreover, F^ reaches its maximum in 
^Mfp' ^^^^ half-line spanned by Mpp. 

Consequently, Mpp is the unique positive definite p x p 
matrix of norm one satisfying 



N 



Mfp=pY^ 



i=i Vi ^ Vi 



(18) 



where yi = x/ M^pX^. 



Remark III.3 

Theorem III. 1 relies on the fact that F^ reaches its maximum in 
V. In order to prove this result, the function F^ is continuously 
extended by the zero function on the boundary ofV, except for 
the zero matrix. Since F^ is positive and bounded in V, we can 
conclude (see Appendix A for details). 

As a consequence of Theorem III.l, the following result can 
be obtained. 

Theorem III.2 

Let S be the discrete dynamical system defined on V by the 
recursion 



M 



(19) 



Then, for all initial condition Mq G V, the resulting sequence 
(M/e)/c>o converges to an FP of f^, i.e., to a point where F^ 
reaches its maximum. 

Theorem III. 2 can be used to characterize numerically the 
points where F^ reaches its maximum and the value of that 
maximum. Note that the algorithm defined by (19) does not 
allow the norm of the FP to be controlled. Therefore, for 
practical convenience, a slightly modified algorithm can be 
used in which a M-normalization is applied at each iteration. 
This modified algorithm is proposed in the following corollary 



Corollary III.l 

The recursion 



initialized by 



M' 



rr[/x(M',)] 



Mn 



Tr (Mo) 



(20) 



(21) 



yields a sequence of matrices {Mq, . . . , M'^.} which converges 
to the FPM.PP up to a scaling factor Moreover, the matrices 
{M'q, . . . , M'^.} are related to {Mq, . . . , M^} by 



M' 



Mi 



Tr(Mi)' 



l<i<k. 
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IV. Proof outline 

This section provides the proofs of Theorems III.l and IIL2. 
Each proof is decomposed into a sequence of lemmas and 
propositions whose arguments are postponed in the appen- 
dices. For the proofs that can be directly obtained from those 
of [35], we refer to [35]. In these cases, the differences due to 
the definitions of the function and the MGGD model for 
the observed vectors x^, for i = 1,...,A^, imply only slight 
modifications. 

A. Proof of Theorem III.l 

The proof of Theorem III.l is the consequence of several 
propositions whose statements are listed below. The first 
proposition shows the existence of an FP satisfying (9). 

Proposition IV. 1 

The supremum of in V is finite and is reached at a point 
M.FP ^ V with \\M.Fp\\ = 1. Therefore, admits the open- 
half Une C^^^ as fixed points. 

Proof: See Appendix A. ■ 

It remains to show that there is no other FP of than those 
belonging to C^^^ . For that purpose, it is sufficient to show 
that all FPs of are collinear. However, Corollary V.l of [35] 
indicates that all FPs of are collinear if all the orbits of 
are bounded in V. We recall here that the orbit of associated 
with ]V[ G P is the trajectory of the dynamical system S 
defined in (19) starting at IVE (See [44] for more details about 
orbits in dynamical systems). Moreover, according to [35], 
when a function admits an FP, every orbit of is bounded 
if the following proposition is verified. 

Proposition IV.2 

The function verihes the following properties 

. (PI) : For all M,Q e V, ifM < Q, then /^(M) < 

/x(Q) ^^^^ wJf^ strict inequalities); 
. (P2) : for all IVE, Q G then 

/^(M + Q)>/^(M) + /^(Q), (22) 

where equality occurs if and only ifM. and Q are collinear. 

Proof: Since the function used in this paper differs 
from the one used in [35], a specific analysis is required. It is 
the objective of Appendix B. ■ 

To summarize. Proposition IV. 1 establishes the existence of 
matrices satisfying the FP equation (9) while Proposition IV.2 
together with the results of [35] can be used to show that there 
is a unique matrix of norm 1 satisfying (9). 

B. Proof of Theorem III. 2 

In order to prove Theorem III. 2, we have to show that each 
orbit of converges to an FP of For that purpose, we 
consider for ailM. eV the positive limit set ujCM) associated 
with M., i.e., the set of cluster points of the sequence {'Mk)k>o 
when k tends to infinity, where IM/c+i = fxi^k) and JVEq = 



M.. Since the orbit of associated with M. is bounded in V, 
the set cj(]V[) is a compact of V and is invariant by for 
all P e oj{M), /x(P) e oj{M). It is clear that the sequence 
{'Mk)k>o converges if and only if ujCM) reduces to a single 
point. According to [35], uj{M.) reduces to a single point if 
the following proposition is satisfied. 

Proposition IV.3 

The function is eventually strictly increasing, i.e., for all 
Q, P G P such that Q>P andQ^P , then 

/^(Q) > /^(P). (23) 

Proof: Since the function used in this paper differs 
from the one used in [35], a specific analysis is required. It is 
the objective of Appendix C. ■ 

V. Simulations 

This section presents simulation results to evaluate the 
performance of the MLE for the parameters of MGGDs. 
The first scenario considers i.i.d. p dimensional data vectors 
(xi, . . . , xat) distributed according to an MGGD. These vec- 
tors have been generated using the stochastic representation (3) 
with a matrix M. defined as 

M{iJ) = pl^-^l fori J G [0,p- 11. (24) 

In the following, 1000 Monte Carlo runs have been used in 
all experiments to evaluate the performance of the proposed 
estimation algorithms. Before analyzing the performance of 
the FP estimators based on (9), we illustrate the importance 
of the normalization Tr (]V[) = p advocated in this paper. 

A. Influence of the normalization 

The main advantage of the normalization (i.e., decomposi- 
tion of T, as the product m xM., and trace constraint for the 
matrix M) concerns the convergence speed of the algorithm. 
To illustrate this point. Fig. 2 shows the evolution of the 
criterion D{k) 

^(^) = ^^^H^' (25) 
IIAfcll 

where = for the blue curves and = irikM.k for 
the red curves. || • || is the Frobenius norm and A/^ is the 
estimator of A at step k. As observed, the convergence speed is 
significantly faster when a normalization condition is imposed 
at each iteration of the algorithm. 

Fig. 3 shows the evolution of the estimated bias and 
consistency of A (the plain curves correspond to A = rh'WL 
whereas A = for the dotted lines) versus the number of 
samples when (3 is not estimated (the parameters are /3 = 0.2, 
p = O.S and p = 3). The estimated bias of A is defined as 
||A — A|| where the operator A is the empirical mean of the 
estimated matrices 

A=-^A(z). (26) 

For a given sample size, the experiment are performed / times 
(/ = 100 in the following). Note that the bias criterion (26) 
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with normalization {p estimated) 
with normalization {p known) 
without normalization {p estimated) 
without normalization {p known) 




10 10 

Number k of iterations 

Fig. 2. Variations of D(k) versus number of iterations for p = 3, /5 = 0.2 
and p = 0.8. 



was used in [42] for assessing the performance of matrix 
estimators. The estimated consistency of A is verified by 
computing 1 1 A — A| |. As observed, the estimation performance 
is the same when a normaUzation constraint for the scatter 
matrix is imposed or not. 



10" 



10^ 



10" 



10"' 



-Bias (with normalization) 

- Bias (without normalization) 
-Consistency (with normalization) 

- Consistency (without normalization) 




500 1000 1500 

Number N of samples 



2000 



Fig. 3. Influence of the normalisation of the scatter matrix on the estimation 
performance: estimated bias and consistency versus number of samples A^. 



A similar comment can be made for the shape parameter 
/3 when this parameter is estimated (see Fig. 4). The Fisher 
information matrix has been recently derived for the parame- 
ters of MGGDs [33]. It has been shown that this matrix only 
depends on the number N of secondary data and the shape 
parameter (3. The Cramer-Rao lower bounds (CRLBs) for 
the MGGD parameters can then be obtained by inverting the 
Fisher information matrix. These CRLBS provide a reference 
(in terms of variance or mean square error) for any unbiased 
estimator of the MGGD parameters. As observed in Fig. 4, the 
variance of $ is very close to the Cramer-Rao lower bound 
for normalized or non-normalized scatter matrices. 



10"' 



10" 



> 10" 



10" 



— with normalization 

without nornnalization 

— Cramer-Rao lower bound 



500 1000 1500 
Number N of samples 



2000 



Fig. 4. Variance of $ versus number of samples N. 

To summarize, the normalization of the scatter matrix (de- 
composition of T, as the product m x M, and trace constraint 
for the matrix M) does not affect the statistical properties of 
the MLE. However, it ensures an increased convergence speed 
of the algorithm. Note also that a similar normalization was 
proposed in [41, Eq. (15)]. 

B. Known shape parameter 

1 ) Convergence of the scatter matrix MLE: Fig. 5 shows 
some convergence results associated with the MLE of the 
scatter matrix M. These results have been obtained for p = 3 
(length of each vector x^), (3 = 0.2 (shape parameter) and 
p = 0.8. Convergence results are first analyzed by evaluating 
the sequence of criteria C{k) defined as 



C{k) = 



|M 



/e+l 



(27) 



Fig. 5. (a) shows examples of criteria C{k) obtained for various 
initial matrices Mq ("moments" stands for Mq equal to the 
estimator of moments [33], "identity" stands for JVEq = Ip and 
"true" corresponds to JVEq = M). After about 20 iterations, all 
curves converge to the same values. Hence, the convergence 
speed of the proposed algorithm seems to be independent of 
its initialization. Fig. 5.(b) shows the evolution of criteria C{k) 
for various numbers N of secondary data. It can be observed 
that the convergence speed increases with TV as expected. 

2) Bias and consistency analysis: Fig. 6. (a) shows the 
estimated bias of M. for different values of (3 (precisely for 
/3g {0.2, 0.5, 0.8}). As observed, the bias converges very fast 
to a small value which is independent of /3. 

Fig. 6.(b) presents some results of consistency for the 
proposed estimator. Here, a plot of 1 1 IM — ]V[ 1 1 as a function 
of the number of samples TV is shown for different values of 
(3 (0.2, 0.5 and 0.8). It can be noticed that this criterion tends 
to a small value when N increases for all values of /3. 

C. Unknown shape parameter 

When (3 is unknown, the MLE of M. and (3 is defined by 
(9) and (13). IfM. would be known, one might think of using 
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a Newton-Raphson procedure to estimate {3. The Newton- 
Raphson recursion based on (13) is defined by the following 
recursion 



(28) 



where $ri is an estimator of (3 at step n, and the function a(/3) 
has been defined in (13). In practice, when the parameters M 
and (3 are unknown, we propose the following algorithm to 
estimate the MGGD parameters. 

Algorithm 1 MLE for the parameters of MGGDs 
1: Initialization of (3 and M. 
2: for k = 1 : N_iter_max do 

3: Estimation of M using one iteration of (9) and nor- 
malization. 

4: Estimation of /3 by a Newton-Raphson iteration com- 
bining (13) and (28). 
5: end for 

6: Estimation of m using (8). 



Fig. 6. (a) Estimated bias for different values of (3, (b) estimated consistency 
for different values of /5. 



10 



10" 



10 



10" 



-2 



-Bias (p estimated) 
Bias (p known) 
-Consistency (p estimated) 
Consistency (p known) 




Number N of samples ^ ^ q 

Fig. 7. Estimated bias and consistency for /5 = 0.2. 



10 

4 



1) Bias and consistency analysis: Fig. 7 shows a compar- 
ison of the algorithm performance when the shape parameter 
13 is estimated (soHd Hne) and when it is known (dashed Hne). 
As observed, the simulation results obtained with the proposed 
algorithm are very similar to those obtained for a fixed value 
of p. 



2) Shape parameter 13: A comparison between the vari- 
ances of estimators resulting from the method of moments 
and the ML principle as well as the correspondings CRLBs 
are depicted in Fig. 8 (versus the number of samples and the 
value of P). Fig. 8. (a) was obtained for/3 = 0.2, p = 0.8 and 
j9 = 3, while Fig. 8.(b) corresponds to TV = 10 000, p = 0.8 



8 



SUBMITTED TO IEEE TRANS. ON SIGNAL PROCESSING 



and p = 3. The ML method yields lower estimation variances 
compared to the moment-based approach, as expected. More- 
over, the CRLB of (3 is very close to the variance of $ in all 
cases, illustrating the MLE efficiency. 



-Maximunn likelihood 
-Moments 

-Cramer-Rao lower bound 




5 10 
Number N of samples ^ ^ q4 

(a) 



illustrate the potential of MGGDs for modeling color cue 
dependences for texture images. 

In the next experiments, we have generated 3-dimensional 
data vectors (xi , . . . , x^v) according to an MGGD with param- 
eters given in Table I. Fig. 11 shows the MLE performance 
for these parameters resulting from real texture images. As 
observed in Fig. 11, the performance of the MLE of M 
is very similar when (3 is estimated or not (illustrating the 
unbiasedness and consistency properties^ of the scatter matrix 
estimator and the MLE efficiency of /3 that have also been 
observed for synthetic data). 






(b) 

Fig. 8. Estimation performance for parameter (3. (a) Variance of (3 versus 
number of samples for /3 = 0.2, p = 0.8 and p = 3, (b) Variance of $ 
versus ^ for N = 10 000, p = 0.8 and p = 3. 

D. Experiments in a real-world setting 

In this part, we propose to evaluate the performance of 
the MLE for the parameters of MGGDs encountered in a 
real-world application. MGGDs have been used successfully 
for modeling the wavelet statistics of texture images [34], 
[45]. In order to analyze the potential of MGGDs for texture 
modeling, we have considered two images from the VisTex 
database [46], namely the "Bark.OOOO" and "Leaves.0008" 
images displayed in Fig. 9. The red, green and blue channels 
of these images have been filtered using the stationary wavelet 
transform with the Daubechies db4 wavelet. For the the first 
scale and orientation, the observed vector x (of size p = 3) 
contains the realizations of the wavelet coefficients for each 
channel of the RGB image. MGGD parameters have then been 
estimated using the proposed MLE for an unknown shape 
parameter (Algorithm 1), i.e., using the algorithm described 
in Section V-C. The results are reported in Table I. Fig. 10 
compares the marginal distributions of the wavelet coefficients 
with the estimated MGGD and Gaussian distributions for the 
first subband of the red, green and blue channels (the top 
figures correspond to the image "Bark.OOOO" whereas the 
bottom figures are for the image "Leaves.0008"). These results 



(a) (b) 

Fig. 9. Images from the VisTex database, (a) Bark.OOOO and (b) Leaves.0008. 

VI. Conclusion 

This paper has addressed the problem of estimating the 
parameters of multivariate generalized Gaussian distributions 
using the maximum likelihood method. For any shape param- 
eter (3 e (0, 1), we have proved that the maximum likelihood 
estimator of the scatter matrix exists and is unique up to a 
scalar factor. By setting to zero the partial derivative with 
respect to the scale parameter of the likelihood associated 
with generalized Gaussian distributions, we obtain a closed 
form expression of the scale parameter as a function of the 
scatter matrix. The profile likelihood is then obtained by 
replacing this expression in the likelihood. The existence of 
the maximum likelihood estimator of the scatter matrix was 
proved by showing that this profile likelihood is positive, 
bounded in the set of symmetric positive definite matrices 
and equals zero on the boundary of this set. We have also 
proved that for any initial symmetric positive definite matrix, 
the sequence of matrices satisfying a fixed point equation 
converges to the unique maximum of this profile likelihood. 
Simulations results have illustrated the unbiasedness and con- 
sistency properties of the maximum likelihood estimator of 
the scatter matrix. Surprisingly, these unbiasedness and consis- 
tency properties are preserved when the shape parameter of the 
generalized Gaussian distribution is estimated jointly with the 
other parameters. Further works include the use of multivariate 
generalized Gaussian distributions for various remote sensing 
applications including change detection, image retrieval and 
image classification. 

Appendix A 
Proof of Proposition IV. 1 

First, it is interesting to note that if M.fp is an FP of 
XM.FP is also an FP of for all A > 0. This property is a 
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TABLE I 

Estimated MGGD parameters for the first subband of the Bark. 0000 and Leaves. 0008 images. 



Image 


rh 


/3 


M 


Bark 0000 


0.036 


0.328 




0.988 0.992 0.883 
0.992 1.131 0.922 
0.883 0.922 0.881 




Leaves 0008 


0.054 


0.265 




0.935 0.966 0.871 
0.966 1.074 0.976 
0.871 0.976 0.991 
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XR 

(d) 
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(e) 



(f) 



Fig. 10. Marginal distributions of the wavelet coefficients with the estimated MGGD and Gaussian distributions of the first subband for the red, green and 
blue channels of the Bark.OOOO (a,b,c) and Leaves. 0008 images (d,e,f). 

direct consequence of the homogeneity of degree one of which was demonstrated in [35] and concludes the proof. ■ 
We start by demonstrating the following lemma. 

Lemma A.l the proof of Proposition IV. 1 

The function can be extended as a continuous function of The end of the proof of Proposition IV. 1 is similar to the 



V\{0} such that (M) = for all non invertible matrix M G 
V\{0}. 



one given in [35]. Since F^ is defined and continuous in the 
compact T){1), this function reaches its maximum in T){1) at 
a point denoted SisM.Fp. Since F^ is strictly positive in P(l) 
Proof: It is enough to show that, for aU non invertible ^nd equals in V{1)\V{1), the inequality F^{Mfp) > 
M G P\{0}, and all sequence {Qk)k>o of V converging to ^^^^^ G In order to complete the proof of 



zero such that ]V[ + is invertible, we have 



Proposition IV. 1, we need to show the following lemma. 



lim F^{M + Qk 



0. 



Lemma A.2 

Using the definition of Fv in (15), the following result can t ^ tC^ ^ -r^r-w ... .u ^- 

f xv^' & i^Qi M.FP G T)[l) maxinuzmg the function F^. Then, 



be obtained for all /c > 
F^(M + Qfc) 



N 



VF^{Mfp) = 0, which implies thatMFP is an FP of f^. 

Proof: Since the function F^ defined in (15) differs 
from the one used in [35], a specific analysis is required. By 
definition of JVEpp, one has 



Since -p/P < 0, the conclusion holds true if 1 < z* < A/" 
such that 



FP) 



max Fy(M). 



lim 



1 



1 



oo |M + Qfc| [xf.(M + Qfe)-ix,*]^ 



0. 



By defining ^/{M) = \\M\\^ - 1, one has A/'(Mpp) = 
0. The Lagrange theorem ensures that S/F^CMfp) = 
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Fig. 11. Estimation performance in a real-world setting. Estimated bias and consistency for (a) the Bark.OOOO and (b) the Leaves.0008 settings. Variance of 
(9 versus number of samples A'" for (c) the Bark.OOOO and (d) the Leaves.0008 settings. 



AVA/'(Mfp) = 2AMfp for A > 0. Straightforward com- 
putations lead to 



Appendix B 
Proof of Proposition IV.2 

We start by establishing (PI). Let M, Q e P with M < Q. 
Then, M'^ > Q-^ and, for all 1 < i < A^, we have 



VF^(M).VAA(M) = Tr[VF^(M)V7V(M)] 
= 2F^(M)Tr[M-i (/^(M) - M)] 
= 2F^(M)(Tr[M-V^(M)] - p) 

^ ^ Tr(M-ix,xf ) 



2F^(M) 
2F^(M) 



Z f 1 

N yN 

N \ N 



1-/5 



i=l 
N 



i=l 



N 



p\ =0. 



i=l 



Since V^F;^(Mfp) = 2AMfp, one has 2A 2A ||Mfp|P 
VF^(Mpp).Mpp = which completes the proof of 
Lemma A.2. ■ 



1 



< 



xf M-i X, + (xf M-i X,) 

1 

xf Q-i X. + (xf Q-^ X,) i^J Q"' ^^f 

which proves the property (PI). The reasoning for the case 
with strict inequahties is identical. 

We next turn to the proof of (P2). Using the definition of 
in (16), the following result can be easily obtained 



/x(M) 



^ ( 1 

N \ N 



N 



-1 



N rj. 

Xi X • 



E^f E^ 



(B.29) 



i=i / i=i Hi 
For all unit vector x G such that ||x|| = 1 and all M G P, 



PASCAL et al.\ PARAMETER ESTIMATION FOR MULTIVARIATE GENERALIZED GAUSSIAN DISTRIBUTIONS 



11 



it is well known that 



1 



z^Mz 



mm 



(B.30) 



X^M-lx z^x/O (X^Z)2' 

where the minimum is reached for the vectors z belonging to 
the line generated by x. Moreover, since {3 G (0, 1), one 
has 

1-/3 



x^M-ix, 

1 



1-/3 



/3 



x^M-ix 



. ^ z Mz 

Z^X/O (X^z)2y 

z^Mz^^ 



inf 



VZ^X^^O (X^ Z)2y 

For M,Q G V, after noting that the function /^(M) is 
unchanged if we replace each vector Xi by the normalized 
vector iii = Xi/||xi||, the following results can be obtained 



/x(M + Q) 



P_ 
N 



N 



mm 



Z^ (M + Q) Z 



ll-/5 



N 



^5n(M + Q)^n.nr( min^ 



(nfz)2 
M z z^ Q z 



(nf z)2 („f z)2 



where 



5n(M) = 



lf:(„fM-n,)^ 



N 



i=l 



Assuming /^(Q) = /x(P) ™d using hypothesis (if) yields 
for a\\l<i<N, 

{^fQ-'^,y-^ (Ef=ixjQ-ix,) = 

(xfP-^x,)-/' (E,1ixJP-x,). 

Moreover, assuming that there exists i such that xf x^ ^ 
xf P~-^ x^, i.e., xf Q~-^ x^ < xf P~-^ x^ implies 

(xfQ-^x,)^-'' (Ef=ixjQ-ix,)< 

(xfP-^x,)^-'' (E,1,xJP-1x,) 

which contradicts /^(Q) = /^(P). Thus, /^(Q) = /^(P) 
yields xf Q-^Xj = xf p-^x^, for all 1 < i < A^". As a 
consequence, for all 1 < i < A'^ 

xf (P-I-Q-I)x, =0. 

Since P~^ — >0, the previous equality indicates that 
(P-^ - Q"^)x^ = 0, for all 1 < i < A/". Using hypothesis 
-({H), the claim (C.32) is proved. 

We now move to the proof of Proposition IV. 3. We consider 
Q,P G P such that Q > P and Q ^ P. As shown above, 
we have /^(Q) > /^(P) and /^(Q) ^ /x(P), which implies 
the existence of an index ^ [l, ^1 such that 



More generally 



6, 



1 



N 



(xf^Q-^x.J^-^ (xf,P-^x., 



min[/i(z) + /2(z)] > min/i(z) +min/2(z), 
zeA zeA zeA 

for all functions /i , /2 and set A giving a sense to the previous 
inequality. The same reasoning can be applied to the function 
^n(-) introduced above. Thus, (P2) clearly holds true. It 
remains to study when equality occurs in (P2). The property 
(P2) becomes an equality if and only if, for all 1 < z < A/", 
one has 

z^Mz z^Qz 



.1-/5 



> 0. 



Note that for /3 = 0, this result reduces to what was obtained 
in [35]. Up to a relabel, we may assume that io = I, hence 



/^(Q)>/^(P)+^iXixf 



(C.33) 



mm 

z^ niy^O 



(nf z)2 (nf z)2 



z^ M z . z^ Q z 

z^^n-^O (nf Z)2 z^n.^O {uf z)^' 



mm 



(B.31) 



which was shown in [35] to be true if and only if IVE and Q 
are collinear. ■ 

Appendix C 
Proof of Proposition IV. 3 

The proof of Proposition IV. 3 is similar to the proof of 
Proposition V.3 of [35], even if the function used in this 
paper differs from the one defined in [35]. We first show that 
for all Q, P G P, we have 

If Q > P and /x(Q) = /^(P), then Q = P. (C.32) 

Since Q > P implies P"^ - Q"^ > 0, for all 1 < i < A^, 
we have 

1 1 

> 



x^ P 



which is the same result as the one obtained in [35]. As a 
consequence, the end of the proof of Proposition V.3 derived 
in [35] can be applied to our problem without any change. 
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